Goto

Collaborating Authors

 stability and convergence


On the Stability and Convergence of Robust Adversarial Reinforcement Learning: A Case Study on Linear Quadratic Systems

Neural Information Processing Systems

Reinforcement learning (RL) algorithms can fail to generalize due to the gap between the simulation and the real world. One standard remedy is to use robust adversarial RL (RARL) that accounts for this gap during the policy training, by modeling the gap as an adversary against the training agent.


Review for NeurIPS paper: On the Stability and Convergence of Robust Adversarial Reinforcement Learning: A Case Study on Linear Quadratic Systems

Neural Information Processing Systems

Additional Feedback: Overall, I have a bit negative opinion of the paper. My main concerns include: 1, the related work is not well discussed. Authors define robust stability condition, which essentially makes a critical intermediate term in analysis easy to deal with. A more reasonable assumption should be imposed on A,B,C. Post rebuttal 1, some potential references about RARL that should be included are: Extending robust adversarial reinforcement learning considering adaptation and diversity, Shioya et al 2018; Adversarial Reinforcement Learning-based Robust Access Point Coordination Against Uncoordinated Interference, Kihira et al 2020; Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient, Li et al 2019; Policy-Gradient Algorithms Have No Guarantees of Convergence in Linear Quadratic Games, Mazumdar et al 2019; Policy Iteration for Linear Quadratic Games With Stochastic Parameters, Gravell et al 2020; Risk averse robust adversarial reinforcement learning, Pan et al 2019; Online robust policy learning in the presence of unknown adversaries, Havens et al 2018.


Review for NeurIPS paper: On the Stability and Convergence of Robust Adversarial Reinforcement Learning: A Case Study on Linear Quadratic Systems

Neural Information Processing Systems

This paper studies a recent method on Robust Adversarial Reinforcement Learning (RARL) by Pinto et al in the linear quadratic setting (linear dynamics, quadratic cost function), which is a typical starting point in the analysis of optimal control algorithms. The paper examines the stabilization behavior of the linear controller, showing that RARL in the simplified linear quadratic setting shows instabilities. The paper proposes a new formulation of RARL in the linear quadratic setting, which can inform solutions in the nonlinear setting, and provides stability guarantees for the proposed method. In the post rebuttal discussion 3/4 reviewers evaluated the paper highly and recommended that the paper be accepted. I agree that the paper makes a significant and interesting enough contribution in terms of pointing out the instabilities of RARL and addressing them in the linear quadratic setting, which in my view is sufficient for publication at NeurIPS.


On the Stability and Convergence of Robust Adversarial Reinforcement Learning: A Case Study on Linear Quadratic Systems

Neural Information Processing Systems

Reinforcement learning (RL) algorithms can fail to generalize due to the gap between the simulation and the real world. One standard remedy is to use robust adversarial RL (RARL) that accounts for this gap during the policy training, by modeling the gap as an adversary against the training agent. We first observe that the popular RARL scheme that greedily alternates agents' updates can easily destabilize the system. Motivated by this, we propose several other policy-based RARL algorithms whose convergence behaviors are then studied both empirically and theoretically. We find: i) the conventional RARL framework (Pinto et al., 2017) can learn a destabilizing policy if the initial policy does not enjoy the robust stability property against the adversary; and ii) with robustly stabilizing initializations, our proposed double-loop RARL algorithm provably converges to the global optimal cost while maintaining robust stability on-the-fly.


Finite Horizon Q-learning: Stability, Convergence and Simulations

VP, Vivek, Bhatnagar, Dr. Shalabh

arXiv.org Artificial Intelligence

Q-learning is a popular reinforcement learning algorithm. This algorithm has however been studied and analysed mainly in the infinite horizon setting. There are several important applications which can be modeled in the framework of finite horizon Markov decision processes. We develop a version of Q-learning algorithm for finite horizon Markov decision processes (MDP) and provide a full proof of its stability and convergence. Our analysis of stability and convergence of finite horizon Q-learning is based entirely on the ordinary differential equations (O.D.E) method. We also demonstrate the performance of our algorithm on a setting of random MDP.


Conditions for Stability and Convergence of Set-Valued Stochastic Approximations: Applications to Approximate Value and Fixed point Iterations

Ramaswamy, Arunselvan, Bhatnagar, Shalabh

arXiv.org Machine Learning

The main aim of this paper is the development of easily verifiable sufficient conditions for stability (almost sure boundedness) and convergence of stochastic approximation algorithms (SAAs) with set-valued mean-fields, a class of model-free algorithms that have become important in recent times. In this paper we provide a complete analysis of such algorithms under three different, yet related sets of sufficient conditions, based on the existence of an associated global/local Lyapunov function. Unlike previous Lyapunov function based approaches, we provide a simple recipe for explicitly constructing the Lyapunov function, needed for analysis. Our work builds on the works of Abounadi, Bertsekas and Borkar (2002), Munos (2005), and Ramaswamy and Bhatnagar (2016). An important motivation for the flavor of our assumptions comes from the need to understand dynamic programming and reinforcement learning algorithms, that use deep neural networks (DNNs) for function approximations and parameterizations. These algorithms are popularly known as deep learning algorithms. As an important application of our theory, we provide a complete analysis of the stochastic approximation counterpart of approximate value iteration (AVI), an important dynamic programming method designed to tackle Bellman's curse of dimensionality. Further, the assumptions involved are significantly weaker, easily verifiable and truly model-free. The theory presented in this paper is also used to develop and analyze the first SAA for finding fixed points of contractive set-valued maps.


Stability of Stochastic Approximations with `Controlled Markov' Noise and Temporal Difference Learning

Ramaswamy, Arunselvan, Bhatnagar, Shalabh

arXiv.org Machine Learning

In this paper we present a `stability theorem' for stochastic approximation (SA) algorithms with `controlled Markov' noise. Such algorithms were first studied by Borkar in 2006. Specifically, sufficient conditions are presented which guarantee the stability of the iterates. Further, under these conditions the iterates are shown to track a solution to the differential inclusion defined in terms of the ergodic occupation measures associated with the `controlled Markov' process. As an application to our main result we present an improvement to a general form of temporal difference learning algorithms. Specifically, we present sufficient conditions for their stability and convergence using our framework. This paper builds on the works of Borkar as well as Benveniste, Metivier and Priouret.